132 research outputs found

    Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration

    Get PDF
    Bayesian approaches for handling covariate measurement error are well established, and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration (RC), arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to RC, and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next we describe the closely related maximum likelihood and multiple imputation approaches, and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of RC and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey

    A guide to interpreting estimated median age of survival in cystic fibrosis patient registry reports.

    Get PDF
    Survival statistics, estimated using data collected by national cystic fibrosis (CF) patient registries, are used to inform the CF community and monitor survival of CF populations. Annual registry reports typically give the median age of survival, though different registries use different estimation approaches and terminology, which has created confusion for the community. In this article we explain how median age of survival is estimated, what its interpretation is, and what assumptions and limitations are involved. Information on survival from birth is less useful for individuals who have already reached a certain age and we propose use of conditional survivor curves to address this. We provide recommendations for CF registries with the aim of facilitating clear and consistent reporting of survival statistics. Our recommendations are illustrated using data from the UK Cystic Fibrosis Registry

    A toolkit for measurement error correction, with a focus on nutritional epidemiology.

    Get PDF
    Exposure measurement error is a problem in many epidemiological studies, including those using biomarkers and measures of dietary intake. Measurement error typically results in biased estimates of exposure-disease associations, the severity and nature of the bias depending on the form of the error. To correct for the effects of measurement error, information additional to the main study data is required. Ideally, this is a validation sample in which the true exposure is observed. However, in many situations, it is not feasible to observe the true exposure, but there may be available one or more repeated exposure measurements, for example, blood pressure or dietary intake recorded at two time points. The aim of this paper is to provide a toolkit for measurement error correction using repeated measurements. We bring together methods covering classical measurement error and several departures from classical error: systematic, heteroscedastic and differential error. The correction methods considered are regression calibration, which is already widely used in the classical error setting, and moment reconstruction and multiple imputation, which are newer approaches with the ability to handle differential error. We emphasize practical application of the methods in nutritional epidemiology and other fields. We primarily consider continuous exposures in the exposure-outcome model, but we also outline methods for use when continuous exposures are categorized. The methods are illustrated using the data from a study of the association between fibre intake and colorectal cancer, where fibre intake is measured using a diet diary and repeated measures are available for a subset

    Using full-cohort data in nested case-control and case-cohort studies by multiple imputation.

    No full text
    In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure-disease association studies are therefore often based on nested case-control or case-cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case-control or case-cohort study plus the remainder of the cohort as a full-cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub-studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full-cohort information in the analysis of nested case-control and case-cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter-matching in nested case-control studies and a weighted analysis for case-cohort studies, both of which use some full-cohort information. Approximate imputation models perform well except when there are interactions or non-linear terms in the outcome model, where imputation using rejection sampling works well

    Handling missing data in matched case-control studies using multiple imputation.

    Get PDF
    Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data

    Simulating data from marginal structural models for a survival time outcome

    Full text link
    Marginal structural models (MSMs) are often used to estimate causal effects of treatments on survival time outcomes from observational data when time-dependent confounding may be present. They can be fitted using, e.g., inverse probability of treatment weighting (IPTW). It is important to evaluate the performance of statistical methods in different scenarios, and simulation studies are a key tool for such evaluations. In such simulation studies, it is common to generate data in such a way that the model of interest is correctly specified, but this is not always straightforward when the model of interest is for potential outcomes, as is an MSM. Methods have been proposed for simulating from MSMs for a survival outcome, but these methods impose restrictions on the data-generating mechanism. Here we propose a method that overcomes these restrictions. The MSM can be a marginal structural logistic model for a discrete survival time or a Cox or additive hazards MSM for a continuous survival time. The hazard of the potential survival time can be conditional on baseline covariates, and the treatment variable can be discrete or continuous. We illustrate the use of the proposed simulation algorithm by carrying out a brief simulation study. This study compares the coverage of confidence intervals calculated in two different ways for causal effect estimates obtained by fitting an MSM via IPTW.Comment: 29 pages, 2 figure

    Big data: Some statistical issues.

    Get PDF
    A broad review is given of the impact of big data on various aspects of investigation. There is some but not total emphasis on issues in epidemiological research

    Emulated trial investigating effects of multiple treatments: estimating combined effects of mucoactive nebulisers in cystic fibrosis using registry data

    Get PDF
    Introduction: People with cystic fibrosis (CF) are often on multiple long-term treatments, including mucoactive nebulisers. In the UK, the most common mucoactive nebuliser is dornase alfa (DNase). A common therapeutic approach for people already on DNase is to add hypertonic saline (HS). The effects of DNase and HS used alone have been studied in randomised trials, but their effects in combination have not. This study investigates whether, for people already prescribed DNase, adding HS has additional benefit for lung function or use of intravenous antibiotics.// Methods: Using UK CF Registry data from 2007 to 2018, we emulated a target trial. We included people aged 6 years and over who were prescribed DNase without HS for 2 years. We investigated the effects of combinations of DNase and HS over 5 years of follow-up. Inverse-probability-of-treatment weighting was used to control confounding. The period predated triple combination CF transmembrane conductance regulator modulators in routine care.// Results: 4498 individuals were included. At baseline, average age and forced expiratory volume in 1 s (FEV1%) predicted were 21.1 years and 69.7 respectively. During first year of follow-up, 3799 individuals were prescribed DNase alone; 426 added HS; 57 switched to HS alone and 216 were prescribed neither. We found no evidence that adding HS improved FEV1% at 1–5 years, or use of intravenous antibiotics at 1–4 years, compared with DNase alone.// Conclusion: For individuals with CF prescribed DNase, we found no evidence that adding HS had an effect on FEV1% or prescription of intravenous antibiotics. Our study illustrates the emulated target trial approach using CF Registry data

    Effects of Classical Exposure Measurement Error on the Shape of Exposure-Disease Associations

    Get PDF
    In epidemiology many exposures of interest are measured with error. Random, or 'classical', error in exposure measurements attenuates linear exposure-disease associations. However, its precise effects on different nonlinear associations are not well known. We use simulation studies to assess how classical measurement error affects observed association shapes and power to detect nonlinearity. We focus on a proportional hazards model for the exposure-disease association and consider six true association shapes of relevance in epidemiology: linear, threshold, U-shaped, J- shaped, increasing quadratic, asymptotic. The association shapes are modeled using three popular methods: grouped exposure analyses, fractional polynomials, P-splines. Under each true association shape and each method we illustrate the effects of classical exposure measurement error, considering varying degrees of random error. We also assess what we refer to as MacMahon's method for correcting for classical exposure measurement error under grouped exposure analyses, which uses replicate measurements to estimate usual exposure within observed exposure groups. The validity of this method for nonlinear associations has not previously been investigated. Under nonlinear exposure-disease associations, classical measurement error results in increasingly linear shapes and not always an attenuated association at a given exposure level. Fractional polynomials and P-splines give similar results and offer advantages over grouped exposure analyses by providing realistic models. P-splines offer greatest power to detect nonlinearity, however random exposure measurement error results in a potential considerable loss of power to detect nonlinearity under all methods. MacMahon's method performs well for quadratic associations, but does not in general recover nonlinear shapes
    • …
    corecore